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Abstract 

We introduce a family of one-dimensional geometric growth models, constructed iteratively by locally 
optimizing the tradeoffs between two competing metrics, and show that this family is equivalent to a 
family of preferential attachment random graph models with upper cutoffs. This is the first explanation 
of how preferential attachment can arise from a more basic underlying mechanism of local competition. 
We rigorously determine the degree distribution for the family of random graph models, showing that it 
obeys a power law up to a finite threshold and decays exponentially above this threshold. 

We also rigorously analyze a generalized version of our graph process, with two natural parameters, 
one corresponding to the cutoff and the other a "fertility" parameter. We prove that the general model 
has a power-law degree distribution up to a cutoff, and establish monotonicity of the power as a function 
of the two parameters. Limiting cases of the general model include the standard preferential attachment 
model without cutoff and the uniform attachment model. 
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1 Introduction 



1.1 Network growth models 

This paper is dedicated, with great affection and admiration, to Bela Bollobas on the occasion of his 60th 
birthday. Two of us (C.B. and J.T.C.) are privileged to count Bela among our dearest friends. And all of us 
have been inspired by his pioneering work on graph processes in general, and scale-free graphs in particular. 
We use the opportunity of this birthday volume to provide complete proofs of results on a new graph model, 
first announced in 

There is currently tremendous interest in understanding the mathematical structure of networks - espe- 
cially as we discover the pervasiveness of network structures in natural and engineered systems. Much recent 
theoretical work has been motivated by measurements of real-world networks, indicating they have certain 
"scale-free" properties, such as a power-law distribution of degrees. For the Internet graph, in particular, 
both the graph of routers and the graph of autonomous systems (AS) seem to obey power laws [T51 Ho] . 
However, these observed power laws hold only for a limited range of degrees, presumably due to physical 
constraints and the finite size of the Internet. 

Many random network growth models have been proposed which give rise to power-law degree distribu- 
tions. Most of these models rely on a small number of basic mechanisms, mainly preferential attachment 1 
|20lE] or copying ^S], extending ideas known for many years |13M21ll2"3"ll22| to a network context. Variants 
of the basic preferential attachment mechanism have also been proposed, and some of these lead to changes 
in the values of the exponents in the resulting power laws. For extensive reviews of work in this area, see 
Albert and Barabasi [2], Dorogovtsev and Mendes an d Newman ^5]; for a survey of the relatively 
limited amount of mathematical work see jS| . Most of this work concerns network models without reference 
to an underlying geometric space. Nor do most of these models allow for heterogeneity of nodes, or address 
physical constraints on the capacity of the nodes. Thus, while such models may be quite appropriate for 
geometry-free networks, such as the web graph, they do not seem to be ideally suited to the description of 
other observed networks, e.g., the Internet graph. 

In this paper, instead of assuming preferential attachment, we show that it can arise from a more basic 
underlying process, namely competition between opposing forces. The idea that power laws can arise from 
competing effects, modeled as the solution of optimization problems with complex objectives, was proposed 
originally by Carlson and Doyle (TJ]|. Their "highly optimized tolerance" (HOT) framework has reliable 
design as a primary objective. Fabrikant, Koutsoupias and Papadimitriou (FKP) introduce an elegant 
network growth model with such a mechanism, which they called "heuristically optimized trade-offs" . As 
in many growth models, the FKP network is grown one node at a time, with each new node choosing a 
previous node to which it connects. However, in contrast to the standard preferential attachment types of 
models, a key feature of the FKP model is the underlying geometry. The nodes are points chosen uniformly 
at random from some region, for example a unit square in the plane. The trade-off is between the geometric 
consideration that it is desirable to connect to a nearby point, and a networking consideration, that it is 

1 As Aldous pi points out, proportional attachment may be a more appropriate name, stressing the linear dependence of the 
attractiveness on the degree. 
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desirable to connect to a node that is "central" in the network as a graph. Centrality is measured by using, 
for example, the graph distance to the initial node. The model has a tunable, but fixed, parameter, which 
determines the relative weights given to the geometric distance and the graph distance. 

The suggestion that competition between two metrics could be an alternative to preferential attachment 
for generating power-law degree distributions represents an important paradigm shift. Though FKP intro- 
duced this paradigm for network growth, and FKP networks have many interesting properties, the resulting 
distribution is not a power law in the standard sense [S]- Instead the overwhelming majority of the nodes are 
leaves (degree one), and a second substantial fraction heavily connected "stars" (hubs), producing a node 
degree distribution which has clear bimodal features. 2 

Here, instead of directly producing power laws as a consequence of competition between metrics, we show 
that such competition can give rise to a preferential attachment mechanism, which in turn gives rise to power 
laws. Moreover, the power laws we generate have an upper cutoff, which is more realistic in the context of 
many applications. 

1.2 Overview of competition-induced preferential attachment 

We begin by formulating a general competition model for network growth. Let xq, x%, . . . ,xt be a sequence 
of random variables with values in some space A. We think of the points Xo,xi, . . . ,Xt arriving one at a 
time according to some stochastic process. For example, we typically take A to be a compact subset of R d , 
xq to be a given point, say the origin, and X\, . . . , x t to be i.i.d. uniform on A. The network at time t will 
be represented by a graph, G(t), on t + 1 vertices, labeled 0,1, ... ,t, and at each time step, the new node 
attaches to one or several nodes in the existing network. For simplicity, here we assume that each new node 
connects to a single node, resulting in G(t) being a tree. 

Given G(t — 1), the new node, labeled t, attaches to that node j in the existing network that minimizes 
a certain cost function representing the trade-off of two competing effects, namely connection or startup 
cost, and routing or performance cost. The connection cost is represented by a metric, gij{t), on {0, . . . , t} 
which depends on xq, . . . , xt, but not on the current graph G(t — 1), while the routing cost is represented by 
a function, hj(t — 1), on the nodes which depends on the current graph, but not on the physical locations 
Xq, . . . ,x t of the nodes 0, . . . , t. This leads to the cost function 

c t = min [ag t j{t) + hj(t - 1)] , (1) 

j 

where a is a constant which determines the relative weighting between connection and routing costs. We 
think of the function hj(t — 1) as measuring the centrality of the node j; for simplicity, we take it to be the 
hop distance along the graph G{t — 1) from j to the root 0. 

To simplify the analysis of the random graph process, we will assume that nodes always choose to connect 
to a point which is closer to the root, i.e., they minimize the cost function 

~ Ct= .„ T^, „N?y(*) + M*-i)], (2) 

J = lFjll<lFt|| 

2 In simulations of the FKP model, this can be clearly discerned by examining the probability distribution function (pdf); 
for the system sizes amenable to simulations, it is less prominent in the cumulative distribution function (cdf). 
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where || • || is an appropriate norm. 

In the original FKP model, A is a compact subset of M 2 , say the unit square, and the points X{ are 
independently uniformly distributed on A. The cost function is of the form with cfcj — dij, the Euclidean 
metric (modeling the cost of building the physical transmission line), and hj(t) is the hop distance along 
the existing network G(t) from j to the root. A rigorous analysis of the degree distribution of this two- 
dimensional model was given in and the analogous one-dimensional problem was treated in |17| . 

Our model is defined as follows. 

Definition 1 (Border Toll Optimization Process). Let xo — 0, and let Xi,X2, ■ ■ ■ be i.i.d., uniformly 
at random in the unit interval A = [0, 1], and let G(t) be the following process: At t — 0, G(t) consists of a 
single vertex 0, the root. Let hj(t) be the hop distance to along G(t), and let gij{t) — fiijit) be the number 
of existing nodes between Xi and Xj at time t, which we refer to as the jump cost of i connecting to j . Given 
G(t — 1) at time t — 1, a new vertex, labeled t, attaches to the node j which minimizes the cost function 
Furthermore, if there are several nodes j that minimize this cost function and satisfy the constraint, we 
choose the one whose position Xj is nearest to Xt ■ The process so defined is called the border toll optimization 
process (BTOP). 

As in the FKP model, the routing cost is just the hop distance to the root along the existing network. 
However, in our model the connection cost metric measures the number of "borders" between two nodes: 
hence the name BTOP. Note the correspondence to the Internet, where the principal connection cost is related 
to the number of AS domains crossed - representing, e.g., the overhead associated with BGP, monetary costs 
of peering agreements, etc. In order to facilitate a rigorous analysis of our model, we took the simpler cost 
function so that the new node always attaches to a node to its left. 

It is interesting to note that the ratio of the BTOP connection cost metric to that of the one-dimensional 
FKP model is just the local density of nodes: nij/dij — pij. Thus the transformation between the two 
models is equivalent to replacing the constant parameter a in the FKP model with a variable parameter 
ctij = apij which changes as the network evolves in time. That otij is proportional to the local density of 
nodes in the network reflects a model with an increase in cost for local resources that are scarce or in high 
demand. Alternatively, it can be thought of as reflecting the economic advantages of being first to market. 

Somewhat surprisingly, the BTOP is equivalent to a special case of the following process, which closely 
parallels the preferential attachment model and makes no reference to any underlying geometry. 

Definition 2 (Generalized Preferential Attachment with Fertility and Aging). Let A\,A 2 be two 
positive integer- valued parameters. Let G(t) be the following Markov process, whose states are finite rooted 
trees in which each node is labeled either fertile or infertile. At time t — 0, G(t) consists of a single fertile 
vertex. Given the graph at time t, the new graph is formed in two steps: first, a new vertex, labeled t + 1 and 
initialized as infertile, connects to an old vertex j with probability zero if j is infertile, and with probability 

PK* + l-i) = ^||^ (3) 

if j is fertile. Here, dj(t) is equal to 1 plus the out-degree of j , and W(t) — J~*- min{dj(t), A 2 } with the sum 
running over fertile vertices only. We refer to vertex t + 1 as a child of j . Lf after the first step, j has more 
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than A\ — 1 infertile children, one of them, chosen uniformly at random, becomes fertile. The process so 
defined is called a generalized preferential attachment process with fertility threshold A\ and aging threshold 
A 2 . 

The special case A\ = A 2 is called the competition-induced preferential attachment process with param- 
eter Ai . 

The last definition is motivated by the following theorem, to be proved in Section [21 To state the 
theorem, we define a graph process as a random sequence of graphs G(0), G(l), G(2), ... on the vertex sets 
{0}, {0,1}, {0,1, 2},..., respectively. 

Theorem 1. As a graph process, the border toll optimization process has the same distribution as the 
competition-induced preferential attachment process with parameter A = [a -1 ] • 

Certain other limiting cases of the generalized preferential attachment process are worth noting. If Ai = 1 
and A2 — 00, we recover the standard model of preferential attachment as considered in |201 14*). If A\ = 1 
and A 2 is finite, the model is equivalent to the standard model of preferential attachment with a cutoff. On 
the other hand, if Ax = A 2 = 1, we get a uniform attachment model. 

The degree distribution of our random trees is characterized by the following theorem, which asserts that 
almost surely (a.s.) the fraction of vertices having degree k converges to a specified limit qu, and moreover 
that this limit obeys a power law for k < A 2 , and decays exponentially above A 2 . 

Theorem 2. Let A\, A 2 be positive integers. Consider the generalized preferential attachment process with 
fertility parameter A\ and aging parameter A 2 . Let Nq(£) be the number of infertile vertices at time t, and 
let Nk(t) be the number of fertile vertices with k — 1 children at time t, k > 1. Then: 

1. There are numbers G [0, 1] such that, for all k > 

y qh a.s., as t -* 00. 4 

t + 1 ' 

2. There exists a number w = w(Ai, A 2 ) G [0, 2] such that the qk are determined by the following equations: 

* = (Ilrz^V * i<*<^ (5) 



k 

\k= 2 



A, ' A 



^A 2 +w 

00 00 
1 = y^ffii and q = y^ qi mm{i -1,A X - 1} 



q A2 if i> A 2 (6) 



i=0 i=l 

3. There are positive constants c\ and C\, independent of A\ and A 2 , such that 

dk-^+V < qk/qx < Cifc-^ +1 ) (7) 

fori <k < A 2 . 
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4- If A± — A2, the parameter w is equal to 1, and for general A\ and A2, w decreases with increasing A\, 
and increases with increasing Ai- 

Equation (JJJ clearly defines a power-law degree distribution with exponent 7 = w + 1 for k < A2 . Note 
that for measurements of the Internet the value of the exponent for the power law is 7 w 2. In our border 
toll optimization model, where A\ — A2, we recover 7 = 2. 

The convergence claim of Theorem [5] is proved using a novel method which we believe is one of the 
main technical contributions of this work. For preferential attachment models which have been analyzed 
in the past \7\ E| j the convergence was established using the Azuma-Hoeffding martingale inequality. 
To establish the bounded-differences hypothesis required by that inequality, those proofs employed a clever 
coupling of the random decisions made by the various edges, such that the decisions made by an edge e 
only influence the decisions of subsequent edges which choose to imitate e's choices. A consequence of this 
coupling is that if e made a different decision, it would alter the degrees of only finitely many vertices. This 
in turn allows the required bounded-differences hypothesis to be established. No such approach is available 
for our models, because the coupling fails. The random decisions made by an edge e may influence the time 
at which some node v crosses the fertility or aging threshold, which thereby exerts a subtle influence on the 
decisions of every future edge, not only those which choose to imitate e. 

Instead we introduce a new approach based on the second-moment method. The argument establish- 
ing the requisite second-moment upper bound is quite subtle; it depends on a computation involving the 
eigenvalues of a matrix describing the evolution of the degree sequence in a continuous-time version of the 
model. 

2 Equivalence of the Two Models 

2.1 Basic properties of the border toll optimization process 

In this section we will turn to the BTOP defined in the introduction, establishing some basic properties 
which will enable us to prove that it is equivalent to the competition-induced preferential attachment model. 
In order to avoid complications we exclude the case that some of the Xj's are identical, an event that has 
probability zero. We say that j £ {0,1 ... ,t} lies to the right of i £ {0, 1 . . . , t} if Xi < Xj, and we say that 
j lies directly to the right of i if X, < Xj but there is no k £ {1, . . . , t} such that n < x^ < Xj. In a similar 
way, we say that j is the first vertex with a certain property to the right of i if j has that property and there 
exists no k G {1, . . . ,t} such that Xi < Xk < Xj and k has the property in question. Similar notions apply 
with "left" in place of "right" . 

Definition 3. A vertex i is called fertile at time t if a hypothetical new point arriving at time t + 1 and 
landing directly to the right of would attach itself to the node i. Otherwise i is called infertile at time t. 

This definition is illustrated in Fig. ^ 

Lemma 3. Let < a < 00, let A — [a -1 ], and let < t < 00. Then 
i) The node is fertile at time t. 
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Figure 1: A sample instance of BTOP for a = 1/3, A = 3, showing the process on the unit interval (on 
the left), and the resulting tree (on the right). Fertile vertices are shaded, infertile ones are not. Note that 
vertex 1 became fertile at t = 3. 

ii) Let i be fertile at time t. If i is the rightmost fertile vertex at time t (case 1), let £ be the number of 
infertile vertices to the right of i. Otherwise (case 2), let j be the next fertile vertex to the right of i, and let 
£ = riijit). Then < I < A — 1, and the £ infertile vertices located directly to the right of i are children of i. 
In case 2, if hj > hi, then j is a fertile child of i and £ = A — 1. As a consequence, the hop count between 
two consecutive fertile vertices never increases by more than 1 as we move to the right, and if it increases 
by 1, there are A — 1 infertile vertices between the two fertile ones. 

Hi) Assume that the new vertex at time t + 1 lands between two consecutive fertile vertices i and j , and 
let £ = nij(t). Then t + 1 becomes a child of i. If £ + 1 < A, the new vertex is infertile at time t + 1, and 
the fertility of all old vertices is unchanged. If £ + 1 = A and the new vertex lies directly to the left of j, the 
new vertex is fertile at time t + 1 and the fertility of the old vertices is unchanged. If I + 1 = A and the new 
vertex does not lie directly to the left of j, the new vertex is infertile at time t + 1. the vertex directly to the 
left of j becomes fertile, and the fertility of all other vertices is unchanged. 

iv) If t + 1 lands to the right of the rightmost fertile vertex at time t, the statements in Hi) hold with j 
replaced by the right endpoint of the interval [0, 1], and riij(t) replaced by the number of vertices to the right 
ofi. 

v) If i is fertile at time t, it is still fertile at time t+1. 
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vi) If i has k children at time t, the £ = min{A — 1, k} leftmost of them are infertile at time t, and any 
others are fertile. 

Proof. The proof is straightforward but lengthy. We include the details of the argument here for complete- 
ness. 

Statement i) is trivial, statement v) follows immediately from iii) and iv), and vi) follows immediately 
from ii). So we are left with ii) — iv). We proceed by induction on t. If ii) holds at time t, and iii) and iv) 
hold for a new vertex arriving at time t + 1, ii) clearly also holds at time t + 1. We therefore only have to 
prove that ii) at time t implies iii) and iv) for a new vertex arriving at time t + 1. 

Assume thus that ii) holds at time t. At time t + 1, a new vertex arrives, and falls directly to the right 
of some vertex k. Let i be the nearest vertex to the left of k that was fertile at time t (if k is fertile at time 
t, we set i = k) and let j be the nearest vertex to the right of i that was fertile at time t (we assume for the 
moment that i is not the rightmost fertile vertex at time t) , let I be the number of vertices between i and j 
at time t. 

Let us first prove that the vertex t + 1 connects to i. If i = k, this is obvious, since i is fertile at time 
t. We may therefore assume that k ^ i. For the new vertex the cost of connecting to the vertex i is 

then equal to a(n ik (t) + 1). Let us first compare this cost to the cost of connecting to a fertile vertex i' to 
the left of i. Let i = i', let i s — i, and let ii, . . . ,i s -i be the fertile vertices between i' and i, ordered from 
left to right. If hi m _ 1 < h im , we use the inductive assumption ii) to conclude that the number of infertile 
vertices between i m -\ and i m is equal to A — 1, and hi m _ 1 = hi m — 1. A decrease of q in the hop cost is 
therefore accompanied by an increase in the jump cost of at least aAq > q. As a consequence, it never pays 
to connect to a fertile vertex i' to the left of i. The cost of connecting to an infertile vertex to the left of i 
is even higher, since the hop count of an infertile vertex is at best equal to the hop count of the next fertile 
vertex to the right. We therefore only have to consider the connection cost to some of the infertile children 
of i. But again, the hop count is worse by 1 when compared to the hop count of i, and the jump cost is at 
best reduced by (A — l)a < 1, proving that the cost of connecting to i is minimal. 

To discuss the fertility of the vertices in the graph G(t +1), we need to consider the arrival of a second 
vertex, labeled t + 2. If t + 2 falls to the left of t + 1, it will face an optimization problem that has not been 
changed by the arrival of the vertex t + 1, implying that the fertility of the vertices to the left of t + 1 is 
unchanged. If t + 2 falls to the right of j, the cost of connecting to j or one of the vertices to the right of 
j is the same as before, and the cost of connecting to a vertex to the left of j is at best equal (the cost of 
connecting to any vertex to the left of t + 1 is in fact higher, due to the additional cost of jumping over the 
vertex t + 1). Therefore, the vertex t + 2 will still prefer to connect to either j or one of the vertices to the 
right of j, implying that the fertility of the vertices to the right of j has not changed at all. We therefore are 
left with analyzing the case where t + 2 falls between t+1 and j. Again, the vertex t + 2 will prefer % over 
any vertex to the left of i (the cost analysis is the same as the one used for t + 1 above), so we just have to 
compare the costs of connecting to the different vertices between i and j. If £ + 1 < A, this will again imply 
that t + 2 connect to i; but if t + 1 — A, the vertex t + 2 will only connect to i if it does not fall to the right 
of the rightmost of the now l + l vertices between i and j. If it falls to the right of this vertex, it will be 
as expensive to connect to the rightmost of the now l + l vertices between i and j as it is to connect to i. 
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Recalling out convention of connecting to the nearest vertex to the left if there is a tie in costs, this proves 
that now t + 2 connects to the rightmost vertex between i and j, implying that this vertex is fertile. 

The above considerations prove the fertility statements in iii) , and thus completes the proof of iii) . The 
case where i is the rightmost fertile vertex at time t is similar (in fact, it is slightly easier since it involves 
fewer cases), and leads to the proof of iv). This completes the proof of Lemma |21 □ 

2.2 Proof of Theorem [Q 

In the BTOP, note that our cost function 

minj [antj(t) + hj(t — 1)] , (8) 

and hence the graph G(t), only depends on the order of the vertices Xo, . . . ,Xt, and not on their actual 
positions in the interval [0, 1]. Let if(t) be the permutation of {0, 1, . . . ,t} which orders the vertices x , . . . , x t 
from left to right, so that 

Xo = ^(i) < X^t) < ■ ■ ■ < ^7T t (t)- (9) 

(Recall that the vertices xo, x\, ■ ■ . , xt are pairwise distinct with probability one.) Note that 7?(i) and ir (t + 1) 
are related as follows: there exists io € {l,2,...,t + l} such that 

if i < io 

if i = io (10) 
if i > io. 

Informally, the permutation ir(t + 1) is obtained by inserting the new element t + 1 into the permutation 
7?(i) in a random position io, where (t) is the left endpoint of the subinterval of (0, 1) into which Xt+\ 
falls. The distribution of the random variable iq may be deduced as follows. Since xq = and x\, X2, ■ ■ ■ ,Xt 
are i.i.d., we know that, for all t, the permutation ir(t) is uniformly distributed among permutations of 
{0, 1, . . . , t} which fix the element 0. This means that, conditioned on a given such permutation if(t), the 
permutation Tr(t + 1) is uniformly distributed among all permutations related to Tr(t) by the transformation 
(JTUJ. In other words, io is uniformly distributed in the set {l,2,...,i+l}. 

With the help of Lemma [31 we now easily derive a description of the graph G(t) which does not involve 
any optimization problem. To this end, let us consider a vertex i with £ infertile children at time t. If a 
new vertex falls into the interval directly to the right of i, or into one of the intervals directly to the right 
of an infertile child of i, it will connect to the vertex i. Since there is a total of t + 1 intervals at time t, the 
probability that a vertex i with I infertile children grows an offspring is (£ + l)/(t + 1). By Lemma [3] (vi) , 
this number is equal to min{ A, ki}/(t + 1), where ki — 1 is the number of children of i. Note that fertile 
children do not contribute to this probability, since vertices falling into an interval directly to the right of a 
fertile child will connect to the child, not the parent. 

Assume now that i did get a new offspring, and that it had A — 1 infertile children at time t. Then the 
new vertex is either born fertile, or makes one of its infertile siblings fertile. Using the principle of deferred 



7Ti(t+l)= \t+l 

U_l(i) 
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decisions, we may assume that with probability 1/A the new vertex becomes fertile, and with probability 
(A — 1)/ A an old one, chosen uniformly at random among the A — 1 candidates, becomes fertile. 

We thus have shown that the solution G(t) of the optimization problem (JSJ) can alternatively be described 
by the competition-induced preferential attachment model with parameter A. 

3 Convergence of the Degree Distribution 

3.1 Overview 

To characterize the behavior of the degree distribution, we will derive a recursion which governs the evolution 
of the vector N(t), whose components are the number of vertices of each degree, at the time when there are 
t nodes in the network. The conditional expectation of N(t + 1) is given by an evolution equation of the 
form 

E (rf(t + 1) - N(t) | N(t)) = M(t)N(t), 

where M(t) depends on t through the random variable W(t) introduced in Definition [5] Due to the ran- 
domness of the coefficient matrix M(t), the analysis of this evolution equation is not straightforward. We 
avoid this problem by introducing a continuous-time process, with time parameter r, which is equivalent to 
the original discrete-time process up to a (random) reparametrization of the time coordinate. The evolution 
equation for the conditional expectations in the continuous-time process involves a coefficient matrix M 
that is not random and does not depend on r. We will first prove that the expected degree distribution 
in the continuous-time model converges to a scalar multiple of the eigenvector p of M associated with the 
largest eigenvalue w. This is followed by the much more difficult proof that the empirical degree distribution 
converges a.s. to the same limit. Finally, we translate this continuous-time result into a rigorous convergence 
result for the original discrete-time system. 

3.2 Notation 

Let A be any integer greater than or equal to max(Ax, A^). Let No(t) be the number of infertile vertices at 
(discrete) time t, and, for k > 1, let Nk(t) be the number of fertile vertices with k — 1 children at time t. Let 
NA{t) — N>A{t) = J2k>A^k(t) 7 and Nk(t) = Nj.(t) if k < A. The combined attractiveness of all vertices 
is denoted by W(t) = min{fc, A 2 }iV fc (t). Let n k {t) = ^N k {t) and h k {t) = ^N k {t). Finally, the 

vectors {N k {t))k=i an d (%(*))jfe=i are denoted by N{t) and h(t) respectively. Note that the index k runs 
from 1 to A, not to A. 

3.3 Evolution of the expected value 

From the definition of the generalized preferential attachment model, it is easy to derive the probabilities 
for the various alternatives which may happen upon the arrival of the (t + l)-st node: 

• With probability A2N A{t) /W (t) , it attaches to a node of degree > A. This increments Ni, and leaves 
Na and all Nj with 1 < j < A unchanged. 
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• With probability mm(A2,k)N k (t)/W(t), it attaches to a node of degree k, where 1 < k < A. This 
increments iVfc+i, decrements N k , increments A*o or N\ depending on whether k < A\ or k > A\, and 
leaves all other Nj with j < A unchanged. 

It follows that the discrete-time process {N k (t)) k=0 at time t is equivalent to the state of the following 
continuous-time stochastic process (N k (r)) k=0 at the random stopping time r = rt of the t-th event. 

• With rate A 2 Na{t), Ni increases by 1. 

• For every < k < A, with rate N k (r) min(fc, A2), the following happens: 

N k -» N k - 1 ; N k+1 -» N k+1 + 1 ; N g(k) -> N g(k) + 1 

where g(k) = for k < A\ and g(k) = 1 otherwise. 

Note that the above rules need to be modified if A\ = 1. Here the birth of a child of a degree-one vertex 
does not change the net number of fertile degree-one vertices, N\. 
Let M be the following A x A matrix: 

-1 iii = j = KA 1 

- min(j, A 2 ) if 2 < i = j < A - 1 

Mij = \ mm(j, A 2 ) if 2<i=j + l<A (11) 

min^', A 2 ) if i = 1 and j > max(^4i, 2) 

otherwise. 

Then, for every t > a, the conditional expectation of the vector N(t) = (Nk(r)) k=1 is given by 

E (n(t) I N(<t)} = e^~^ M N{a). (12) 

It is easy to see that the matrix e M has all positive entries, and therefore (by the Perron-Frobenius Theorem) 
M has a unique eigenvector p of ^i-norm 1 having all positive entries. Let w be the eigenvalue corresponding 
to p. Then w is real, it has multiplicity 1, and it exceeds the real part of every other eigenvalue. Therefore, 
for every non-zero vector y with non-negative entries, 

lim e- TW e TM y = {a,y)p 

T — >00 

where a is the eigenvector of M T corresponding to w, normalized so that (a,p) = 1. Note that (a, y) > 
because y is non-zero and non-negative, and a is positive, again by Perron-Frobenius. Therefore, the vector 
converges to a positive scalar multiple of p, say Xp, as r — > cxo. 

In order to prove concentration for the continuous-time model, we will prove that the difference N k (t) /q k — 
Nj(r)/qj has an exponential growth rate which is at most the real part of the second eigenvalue of M, which 
is strictly less than w, the growth rate of the individual terms N k (r)/q k and Nj(r)/qj. From this, we will 
conclude that the ratio N k (r)/ Nj(t) converges almost surely to q k /qj, for all k and j, which in turn implies 
the convergence of the normalized degree sequence to the vector {q^^Q- 

In order to prove bounds on the growth rate of the differences N k {r)/q k — Nj(r)/qj, we will need some 
auxiliary bounds involving the well-known standard birth process, to be defined below. 
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3.4 Standard birth process 

We start with the definition of the standard birth process with rate p. The standard birth process was first 
introduced by Yule in 1924 |^2], and is a special case of the well known Yule Process, defined in that paper. 

Definition 4. Let p > and let {o n } c ^L 1 be independent exponential random variables so thatK(o n ) = -n _1 . 
For t G [0, oo), let X T — min{n > 1 : J^fc=i °k > t}. Then X is called the standard birth process with 
rate p? 

The standard birth process is connected to our discussion through the following easy claim: 

Claim 1. Let ||./V(t)|| = X^=i -^fe( T )- Let T > 0, let x > y, and let X be a standard birth process with rate 
2. Then {{X T } T > T \X T = x} stochastically dominates {{\\N{t)\\} t >t \\N(T)\\ = y\. 

Proof. Let us start with the observation that X)fc=i °k is the first time r for which X T = n + 1. Let 
{ r n}'^Lo be i.i.d. exponential random variables with mean 1. Then X)fe=i °k has the same distribution as 
J2k=o r k/(2k + 2). The time r„ at which the node n is born has the same distribution as J2k=o r k/W(k), 
where W{k) denotes the combined attractiveness of all nodes at the random time Tfc. The claim follows now 
from the observation that W(k) < 2k + 1 < 2k + 2. □ 

The main purpose of this section is the proof of the following claims. 

Claim 2. Let X be a standard birth process with rate p. Then X T is almost surely finite for every t. 
Furthermore, there exists a constant C s = C s (p) such that for every ti > Ti, x > 1, and k > 1, 



P yX T2 > kxe p{T2 - Tl 

If, in addition, ti — t\ < 1, then 

(X T2 - X T1 > kx[e^-n) _ i] x =x \< C ; (14) 



p 



To see the finiteness of X r , we need to show that 5Z n =i °n = 00 a.s. This follows from the following 
simple argument: For every k, Let 

2 k+i 

u k = °3- 

For j e [2 fc + 1,2* +1 ], with probability greater than ±, Oj > l-2r k -' 1 . Therefore, P{U k > {-) > \. The 
random variables {UkM^i are independent, and therefore Y^=\ °n > Y^k=i Uk — °o almost surely. 

To see (|13[) and H14[l . we use the following Lemma, which is proved in section II of [22] • Since the proof 
is short and simple, we choose to include it for the sake of making the exposition more self-contained. 

Lemma 4 (Yule, 1924). For every r > and every positive integer k, E(Y^) < oo. Furthermore, 
E(Y r ) = exp(pr), (15) 

3 The name "standard birth process" is due to the fact that X T is equivalent to the following process: Start with one cell at 
time 0. At each time, every cell divides into two cells with rate p. Then X T is the number of cells at time t. 
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and 

var(X r ) = exp(2pr) — exp(pr). (16) 

In particular, 

var(X r ) = 0(exp(2pT)), (17) 
and for r < 1 there exists a constant C v = C v {p) so that 

var(X r ) < C v t. (18) 

Proof. An equivalent description of the standard birth process is the following: Let a be an exponential 
variable with expected value p, and let {Gt} be a Poisson point process with rate ae pt . Then X T = 1 + G T 
has the same distribution as the standard birth process. To see this, all we need is to show that for every r 
and n, the rate of the process {Gt} at time r conditioned on X T — n is pn. Indeed, 

J °° ae pT P(X T = n\u)±e- a IPdoL 



rate(r|X r = n) = 



J °° P{X T = n\a)\e~ a IPda 
J™aeP T e-- {cxp{pT) - 1) ^(exp(pr) - 1)J ((n - l)\)-^e- a /"da 



J °° e-?^^- 1 ) (^(exp(pr) - 1)) " ((n - l)\)-^e~ a lPda 
= pn. 

Here the second equality follows from the fact that X T — 1 is a Poisson variable with rate j(e pT — 1), and 

last equality follows by integration by parts. 

From this we get that the distribution of X T is geometric with expected value exp(pr). To see this, we 

again use the fact that X T — 1 is a Poisson variable with rate ^(e pT — 1) where a is an exponential variable 

with expectation p. Therefore, for every n, 

I°° 1 
T>(X T = n + 1) = / T>(X T - 1 = n\a)-e- a/p da 
Jo P 

J e -f(^-l) (j(e pT - 1)J ^e- a/p da= (l-e- pT )P{X T = n) 



n- 1 



= (n!) 

where, again, the last step follows from integration by parts. 

The relations l(T5|) and [(TBjl follow immediately, and l(T7|) and (f]~5|) follow from l(TB|) . □ 

Proof of llty) and | |_?^| ) m Claim\^ Equations (|13|1 and 114(1 will follow from Chebyshev's inequality if we 
show that 

E(X T2 \X T1 ) = X Tl e p ^- T ^ (19) 

and 

var(X T3 |X T1 ) = X ri O (^ 2p ^-- T ^ (20) 

for r 2 > n, and 

var(X T+Tl |X Tl ) =0(r) ■ X T1 (21) 

for t < 1. 

Equations (|T§|) , (J5DJ and J3TJ follow from (respectively) lfT5|) , (|TT|> and ljl%jl and the fact that conditioned 
on X Tl , the process X T+Tl is the sum of X Tl independent copies of X T . □ 
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Remark: From now on we will always assume that p = 2. In particular, whenever we use the term "standard 
birth process" , it should be understood as " standard birth process with rate 2" . 

3.5 Concentration of the continuous-time process 

In order to show concentration of the degree distribution for the continuous-time process, we will prove first 
the following lemma. To state it, we observe for any b with b T p = 0, 

||fe T e (T - r)M |U < IHUe^-^' (22) 

for some v' < w. Without loss of generality, we may assume that v' > w/2. Also, for a general vector b, 

ll&V^^IU < \\b\UP-^. (23) 

Lemma 5. Let b be a vector in M. A with ||6||oo 5~ 1- Then there exists a constant C < oo, such that for all 
T > 0, 

var (b T N{T)J < Ccx P (2mT) (24) 
where u = w if b T p ^ 0, and u = v' if b T p = 0. 
Proof. We use a martingale to bound the variance. Fix T, and let 

L T = E(VA>(T) N(t) 

Clearly, L T is a (continuous-time) martingale. By l|12|) . we know that L T = b T e( T ~ T ^ M N(t). Let < e < 
exp(— 10T) be such that K = T/e is an integer number. Then, {Uk = Lk e }^ =0 is a martingale and 

K-l 

var (6 T A>(T)) = var(C4+i - U k ). 
k=a 

We want to estimate the variance of Uk+i — Uk- Let Vk — N((k + l)e) — N(ke). For two vectors N\ and N2, 

^TgCT-Cfc+l),)^^ _ b T e (T-(k+l)e)Mft 2 y < _ ^ e 2u(T-(k+l)e) ^ 

where the norm || • || refers to the L 1 -norm here and throughout this section, unless otherwise noted. Choose 
N(ke) according to its distribution, and let N\ and N2 be chosen independently, according to the distribution 
of N((k + l)e) conditioned on N(ke). Then 



(frT^T-ik+mM^ _ b T e (T-(k+l)e)Mft 2 y < i E (||jVi _ j^||2j e au(r-(fc+l)e) , 

On the other hand, using the fact that for every vector x in M d , 



var(C4+i - U k ) = -E 



^2xi\ <d^xl, 



1=1 



we get 



1 ^ ^ 
£varM^)>— ECl^-A^f) 
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where Ufc(j) is the j-th component of v^. Therefore, 



var(U k+1 - U k ) < AJ2 

exp [2u(T — (k+ l)e)] var(u fc (j)). 

3=1 



By Claim [I] JTSJ, and J5TJ|, for every j = 1, 2, . . . , A, 



var 



c < 



Mj)) 2 JV(fce) < C„e||tf(*e)||, 



E 



and 



Therefore, 



(«fc(i) 

(v k (j) jiV(fce)) I < (e 2e - l)||JV(fce)|| < 4e\\N(ke) 
E (||iV(fce)||) 2 l ■ < 



var(w fc 0')) = E ( v ar (ufc(j) | + var (E (v k (j) jiV(fce) 

< C„eexp(i(;fce) + 16e 2 exp(4fce) 

< Coeexp(wfce) 



for Co = C v + 1, by the choice of e. Therefore, 



K-l 



for 



var (V "iV fe (T)) < ^ 2 C e ^ exp (wfce + 2u{T - (k + l)e)) 

< A 2 C e 2uT e {w ~ 2u)T dT <C u exp(2uT) 
Jo 

poo 

C u = A 2 C a / e (w - 2u ^ T dT < oo. 
j o 



In addition, note that by {221 and ffify. 



E (b T N(T) 



< e" 



and therefore there exists C so that 



E 



< Ce 



2uT 



(6 T iV(T)) 2 

We are now ready to state and prove the two main lemmas used to prove concentration: 
Lemma 6. For every w' < w and every 1 < k < A, a.s. for every r large enough, 

N k (T) > e w ' T . 

and 
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Lemma 7. There exists v < w s.t. for every 1 < k < j < A a.s. for every r large enough, 

p 3 N k (T)- Pk N 3 (T) <e VT , 
where pi,i — 1, . . . , A are the components of the vector p. 

The following corollary is an immediate consequence of Claim [21 Claim ^ an d Lemma [S] 
Corollary 8. w < 2. 

Proof of Lemma^ Choose some v strictly between v' and w in a way that w — v < 0.25 min(0.1, v—v', w/10) 
and let 6 = min(0.1,t> — v',w/10). The vector 

Pj if i = k 
h = { -pk if i = j 

otherwise 

satisfies b T p — 0, and therefore, using l|2tj|) and Markov's inequality, 

P (pjNkiT) - Pk Nj(T) = b T N(T) > ^e v A < 9Ce~ 25T . (28) 

Let {Ti}i = i } 2,... be such that e 25Ti — i 2 . By Borel-Cantelli, almost surely there exists to such that for all 

i > io, 

PjN k (T t ) — p k Nj (Ti ) < ^e vT \ (29) 

Note that 

U ~ 5 

and therefore 

T i+1 -Ti = Q{r l ). (30) 
We want to show that almost surely for all T large enough, 

Pj N k (T)-p k Nj(T) <e vT . (31) 

SectionEOtells us that E( || N(Ti) \\ ) = 0(cxp(wT z )), and Lcmma0tells us that var(||iV(T j; )||) = 0(exp(2wT 4 )). 
Therefore 

for some constant Ci, so that, if m(i) is the number of vertices arriving between Tj and Tj+i, then 
P (rn(i) > ^e vTi 

< P (\\N(T$\\ > e (-+ - 65 )^) + P f m (i) > ±e vT * 

< P (||JV(Ti)|| > e {w+aM ^) + P (m(i) > \\N(1$\\ = e S w+t) - 6S ^ 

< Qi- 1 - 2 + C s e-^ +0 - 6S ^(T l+1 -T. i )- 1 , (32) 
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||JV(Ti)|| < e {w+t)s5 ^ 



where the last inequality uses (|14fl in Claim [21 and the fact that 

l e „r, > 2e (™+°- 6 ^(exp(2(T l+1 - T«)) - 1) 

for i large enough. 

Clearly, the first part of the right side of l|32[l is a convergent sum. We need to show that so is the second 
part. Remember the choice S < w/10. Then, using i|3U|) . 



Cse -( W+ 0.6 <5 )T i(T . +i _ T . ) -l 

= 6 (e 5Ti ■ e -(«+o^)rA 
= 0(e- 95T >) 



= e 



-(w+0.6S)Ti 



9 ^ e -(«-0.4J)T^ 

0(r 9 ). 



Using Borel-Cantelli, we conclude that almost surely, 

A 



Y, \Nk(T) ~ N k (T t ) 



k=l 



< 2 6 



for all k and all i large enough and all T between Ti and Ti + \. Equation (|3 1|) follows from 13311 . 



(33) 



□ 



Proof of Lemmaffl By LemmaEJ var(iVi(r)) < Cie 2wT , while E(7Vi(r)) > C 2 e WT by SectionES Therefore 
there exists p > such that 

p(#i(t) >pe WT ) >p. (34) 

Fix some large T, and let Tj = iT. For each vertex w which is a fertile leaf at time Tj_i, let denote the 
number of descendants of u (including u itself) at time which are fertile leaves. The random variables {£ v } 
are independent, their sum is Nifc), and the distribution of each of them is the same as the unconditional 
distribution of Ni(T). Using this fact and l|3"l)l. we get 



#i(7i_l)) > 1 - e -w^(n-0 



(35) 



via Chernoff's bound. From (|35|l . we get that almost surely there exists a constant C3 > such that, for all 
i large enough, 

v 



iVi(Ti) > C 3 exp It + log I y 
From Lemma we may conclude that the same holds for JV^t,-), i.e. for any constant C4 < C3, 



N A {ri) > C 4 exp i 



icT + log 



Na(t) is monotone increasing, and therefore there exists C5 > such that 

2 \ i • 



N a (t) > C 5 exp t 



w + i log 



(36) 



for all r large enough. Using Lemma [7| again, we conclude that there exists Cg > such that 



N k (r) > C 6 exp 



w + j, lo S I — 
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for all k and large enough r. We get <(27(1 by taking T so large that 

1 , (P 2 \ 
w + — log ^— > w . 



T V 2 , 

□ 

Proposition 9. For every k and j, almost surely 

lim = Et (37) 

Nj(r) Pj 

Proof. This follows immediately from Lemma Eland Lemma D 
3.6 Back to discrete time 

Proposition 10. For the discrete-time process, and A > max{v4i, A^\ there exists a vector q such that, for 
k < A, we have 

lim ^ = ft. (38) 

t^oo t + 1 

Proof. The number of infertile vertices increases at step t with probability 

(their number cannot decrease). However, by l|37(l . this expression tends to a limit, and therefore, using the 
law of large numbers, 

hm — Qo — a • 39 

Using H37f) once more, the proposition now follows for k > 1 with ft = (1 — qo)pk- d 

Note that the above proposition implies that qk and hence pk is independent of A if A > k, since the left 
hand side of 1)38(1 does not depend on A if A > k. So, in particular, pi does not depend on A. 

4 Power Law With a Cutoff 

In the previous section, we saw that for every A > max{^4i, A2}, the limiting proportions up to A — 1 are 
Xp where p is the eigenvector corresponding to the highest eigenvalue w of the A-by-A matrix M defined in 
Ean. Therefore, the components p\,pi, ■ ■ ■ ,pa of the vector p satisfy the equation: 

wpi — — min(i, A^)Pi + min(z — 1, A%)pi-i i > 2 (40) 

where the normalization is determined by ~^2f = iPi = 1. From 1)40(1 we get that for i < A2, 
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and for i > A-2 

/ A \ i ~ A2 

1 ' PA 2 (42) 



A 2 +w / 

Clearly, l|42[l is exponentially decaying. There are many ways to see that l|41[) behaves like a power law with 
degree 1 + w. Indeed, 

= ex p(E(x^7) + ° (1) ) = cx p((- i -^)(E( fc+u; ) _1 )+ ( 1 )) 

= exp^(-l-«;)f^fe-^ +0(1) J = exp^(-l- W )^log(^n +0(1)^ 

= exp((-l-w)log(i/2)+0(l)) = 0(l)i- 1 -"'. 

Note that the constants implicit in the O(-) symbols do not depend on A\, Ai or i, due the fact that 
< w < 2. Equation l|43|) can be stated in the following way: 

Proposition 11. There exist < c < C < oo such that for every A\, Ai and i < Ai, if w — w(Ai,A2) is 
as in \4ty , then 

cr i- w < < Ci~ x - W . (44) 
Pi 

The vector ((71,(72, ■ ■ ■ , <lA-l) is a scalar multiple of the vector (pi,p2,... ,pa-i), so equations ©, 
and in Theorem (and the comment immediately following it) are consequences of equations (|41[) . I|42|l . 
and 14411 derived above. It remains to prove the normalization conditions 



= 1 and <?o = ^ <7t min(i - 1, Ai - 1) 



i=0 i=l 

stated in Theorem [3] These follow from the equations 

oc oc 

^2Ni{t)=t + l and N (t) = V] iV^(t) min(i - 1, At - 1). 

i=0 i=l 

The first of these simply says that there are £+1 vertices at time f; the second equation is proved by counting 
the number of infertile children of each fertile node. 



5 Monotonicity Properties of w 

In this section we will prove that the exponent 1 + w of the power law in Proposition ^2 is monotonically 
decreasing in Ai and monotonically increasing in A2. For this purpose, it will be useful to define a family of 
matrices, parameterized by two vectors y,z £ R™, which generalizes the matrix M appearing in (jllfl . whose 
top eigenvalue is w. 
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Given vectors y = (y 1 ,y 2 , ■ ■ ■ , y n ), z = (zi,z 2 , 
(ij)-th entry is: 



-V.i 



!Jj 

*i 




. . , z n ) £ R n , let M(y, z) denote the n-by-n matrix whose 

Vi if 1 = i = j 

if 2 < i = j < n 
if 2 < i = j + 1 < ra 
if i = 1 and j > 2 
otherwise. 



Thus, for instance, the matrix M defined in (|llfl is M(y, z), where n = A and 

f mm(i,A 2 ) if 1 < j < A 
^ 1 if, ,1 

f if 1 < j < A 1 

Zj = < 

{ min(j,A 2 ) if A 1 < j < A. 

For the remainder of this section, we will assume: 

• yi > for 1 < i < n, 

• Zj > for 1 < i < rt, 

• y„ = 0, z„ > 0. 



(45) 
(46) 
(47) 



All of these criteria will be satisfied by the matrices M(y, z) which arise in proving the desired monotonicity 
claim. It follows from lf4*5|l . . and (|4T| that if we add a suitably large scalar multiple of the identity matrix 
to M(y, z), we obtain an irreducible matrix M(y, z) + BI with non-negative entries. The Perron- Frobenius 
Theorem guarantees that M(y. z) + BI has a positive real eigenvalue R of multiplicity 1, such that all 
other complex eigenvalues have modulus < R; consequently M(y,z) has a real eigenvalue w = R — B, of 
multiplicity 1, such that the real part of every other eigenvalue is strictly less than w. 

We will study how w varies under perturbations of the parameters y, z. Let P(X, y, z) be the characteristic 
polynomial of A/(y,z), i.e. 

P(A,y,z)=det(AJ-M(y,z)). 

This is a polynomial of degree n in A (with coefficients depending smoothly on y, z), whose largest real root 
w(y, z) exists and has multiplicity 1, provided (y,z) belongs to the region V C l n x R™ determined by 
(I45|l . (|46|l . and (|47() . It follows from the implicit function theorem that ui(y,z) is a smooth function of (y,z) 
in V, satisfying: 



dP 

diji 



dw dP 



0; 



(w.y.z) 



dP 

dzi 



dw dP 
dzj d\ 



0. 



(48) 



If x is any vector in . 



and <9 X is the corresponding directional derivative operator, we have from l|48|l 

d x P(w, y,z) 



d x w(y,z) = — 



(49) 



(&P/9A)|( Wi y iZ ) ' 

We know that {dP/d\)\ (io,y,z) > because P is a polynomial with positive leading coefficient, w is its largest 
real root, and w has multiplicity 1. Thus we have established: 
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Claim 3. For any vector x G K™ x M. n , and any (y, z) 6 V , put w = w(y, z) . Then the directional derivatives 
d x w(y,z) and d x P(w, y, z) Ziaue opposite signs. 

This allows monotonicity properties of w to be deduced from calculations involving directional derivatives 
of P. Given the definition of M(y,z), it is straightforward to compute that 

n 

P(A,y,z) = det(A/-M(y,z)) = P 1 (X,y,z) - ^P^A.y.z), (50) 

J=2 



where 



P(A,y,z) = (A + yi-*i)IJ(A + W ) (51) 

P* (A, y, z) = (jfl f f[ (A + VJ )\ . (52) 

As an easy consequence of this formula, w is strictly positive. 
Lemma 12. w is strictly positive. 

Proof. From 1(50 )1 -1(52 ^1 and the fact that y n = 0, we have P(0,y,z) = — P n (0, y, z) = — ^Iir^a 1 Uij z m an d 
this is strictly negative by l|45|l and (|47|l . For sufficiently large positive A, we know that P(A,y,z) > 
because P is a polynomial whose leading coefficient in A is positive. By the intermediate value theorem, 
P(A,y,z) has a strictly positive real root. □ 

The following three lemmas encapsulate the requisite directional derivative estimates for P. 

Lemma 13. (dP/dzk)\( w , y , z ) < for (y,z) G V. 

Proof. For k > 1, 

dp/dz k = -dp k /dz k = - ( n w ) ( fi ( w + w) ) < °- 

For fc = 1, 

n 

9P/9zi = dPt/dzi = - Y[(w + y t ) < 0. 

i=2 

□ 

Corollary 14. w is monotonically decreasing in A\ . 



Proof. Increasing A\ from k to fc+1 has no effect on y, and its only effect on z is to decrease Zk from min(fc, A2) 
to 0. As we move in the — z^ direction, the directional derivative of P is positive, so the directional derivative 
of w is negative by Claim Thus w decreases as we increase A\ from fc to k + 1. □ 

Lemma 15. For 1 < fc < n, {dP/dyk)\( w ,y,z) < if (y, z) G 1/ and Zfe = 0. 
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Proof. 



dP 



8Pl OPj 

dyu dy k 



1 



fe-i 



< 



w + y k 
1 

w + y k 
P(w, y, z 



Pi 
Pi 



-^—ypj - - y Pi 



3=2 
fe-1 



1 1 V- 
> Pj > ' 

yk 3=2 T yfe j=k+l 



Pi 



w + y k 



= 



Lemma 16. For k > 1, (dP/dy k + dP/dz k )\( w , y ^) < if (y,z) G V and y fe = z fc . 
Proof. 

dP dP 



dy k dz k dy k 



8P l dP, dP k 



< 



w + y k 
1 

w + yk 
P(w,y,z) 



3=2 
Pi - 

Pi - 



dy k dz k 
1 fc-i 

k-l 

-^—y p 3 



1 ™ 1 

,frfi z fe 



j=k+i 



1 

^— E ^ 



i 



w + y k 



Pk 



w + y k 







Corollary 17. w is monotonically increasing in A 2 . 

Proof. If we change A 2 from fc to k + 1 , this changes y into a new vector y' satisfying 



y'j - Vi 



1 if k < j < n 
otherwise. 



It changes z into a new vector z' satisfying 



□ 



□ 



] 1 if max(Ai, k + 1) < j < n 

z j - Z 3 = \ 

y otherwise. 

Letting denote a unit vector in the +yj direction, and a unit vector in the +zj direction, the 
direction of change is expressed by the vector 



x=(y',z')-(y,z) = 



E 



Ay) 



k+l<j<A % 



E W+*T) 

max(k J r l.Ai)<j <n 



+ e 



(z) 
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and d x P is negative, by the preceding three lemmas. By Claim [3] this means w increases monotonically as 
we move along this path. □ 
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